Complex Observation processes – Distance sampling with inlabru
Distance sampling is a family of related methods for estimating the abundance and spatial distribution of wildlife populations.
Distance sampling is based on the idea that animals further away from observers are harder to detect than animals that are nearer.
This idea is implemented in the model as a detection function that depends on distance.
Coupling distance sampling data with spatial modelling allows us to map spatially varying density
Coupling distance sampling data with spatial modelling allows maps of spatially varying density to be produced.
Traditionally, this is achieved in a two-stages approach by (i) using a detectability point estimates to create an offset vector to (ii) use within GLM or GAM for count response data.
This requires binning the data into counts based on some discretisation of space.
Coupling distance sampling data with spatial modelling allows maps of spatially varying density to be produced.
Traditionally, this is achieved in a two-stages approach by (i) using a detectability point estimates to create an offset vector to (ii) use within GLM or GAM for count response data.
A major downside to this approach is the propagation of uncertainty from the detection model to the second-stage spatial model.
flexible approach that can include spatial covariates to model the mean intensity and a spatially structured random effect to account for unexplained heterogeneity not captured by the covariates.
to account for the imperfect detection of points we specify a thinning probability function \[ g(s) = \text{Prob}(\text{a point at s is detected}|\text{a point is at s}) \]
key property – an LGCP with intensity \(\lambda(s)\) that thinned by a probability function \(g(s)\), follows also a LGCP with intensity:
\[ \underbrace{\tilde{\lambda}(s)}_{\text{observed process}} = \underbrace{\lambda(s)}_{\text{true process}} \times \underbrace{g(s)}_{\text{thinning probability}} \]
Lets visualize this on 1D: Intensity function with points
Intensity (density) function with points and transect locations
Detection function \(\color{red}{g(s)}\)
Here \(\color{red}{g(s) =1}\) on the transects (at x = 10,30 and 50).
The detection function describes the probability \(\color{red}{p(s)}\) that an point is detected
Observations are from a thinned Poisson process with intensity \(\lambda(s) \color{red}{p(s)}\)
Half-normal: \(g(\mathbf{s}|\sigma) = \exp(-0.5 (d(\mathbf{s})/\sigma)^2)\)
Hazard-rate :\(g(\mathbf{s}|\sigma) = 1 - \exp(-(d(\mathbf{s})/\sigma)^{-1})\)
\[ \pi(\mathbf{s_1},\ldots,\mathbf{s_m}) = \exp\left( |\Omega| - \int_{\mathbf{s}\in\Omega}\lambda(s)g(s)\text{d}s \right) \prod_{i=1}^m \lambda(\mathbf{s}_i)g(\mathbf{s}_i) \]
To make \(g(s)\) and \(\lambda(s)\) identifiable, we assume intensity is constant with respect to distance from the observer.
If the strips width (\(2W\)) is narrow compared to study region (\(\Omega\)) we can treat them as lines.
We need to adjust the intensity at a point \(\mathbf{s}\) on the line to take account of the actual width of the strip
Adjust the thinning probability to account for having collapsed all points onto the line.
The intensity at a point \(\mathbf{s}\) on the line becomes \(2W\lambda(s)\) instead of \(\lambda(s)\).
Let \(\pi(d)\) be the probability that the point is at a distance \(d\) from the line.
Let \(p(d)\) be the probability that is detected given it is at \(d\).
Then, the thinning probability becomes \(\pi(d)\times p(d)\), assuming the points are uniformly distributed within the strip then \(\pi(d) = 1/W\) (the density of distances is assumed to be constant on the interval \([0,W]\)).
This updates our thinning intensity to
\[ \log \tilde{\lambda}(s) = \underbrace{\mathbf{x}'\beta + \xi(s)}_{\log \lambda(s)} + \log p(d) + \log \times(2/2W) \]
inlabru can help via a Fixed point iteration scheme (further details available in this vignette)In the next example, we will explore data from a combination of several NOAA shipboard surveys conducted on pan-tropical spotted dolphins in the Gulf of Mexico.
A total of 47 observations of groups of dolphins were detected. The group size was recorded, as well as the Beaufort sea state at the time of the observation.
Transect width is 16 km, i.e. maximal detection distance 8 km (transect half-width 8 km).
First, we need to create the mesh used to approximate the random field. We can either:
fm_mesh_2d and fm_nonconvex_hull functions from the fmesher package:max.edge for maximum triangle edge lengthscutoff to avoid overly small triangles in clustered areasFirst, we need to create the mesh used to approximate the random field. We can either:
sf boundary and specify this directly onto the mesh construction via the fm_mesh_2d functionmax.edge for maximum triangle edge lengthscutoff to avoid overly small triangles in clustered areasWe start by plotting the distances and histogram of frequencies in distance intervals.
Then, we need to define a half-normal detection probability function. This must take distance as its first argument and the linear predictor of the sigma parameter as its second:
The LGCP Model
\[ \begin{aligned} p(\mathbf{y} | \lambda) & \propto \exp \left( -\int_\Omega \lambda(\mathbf{s}) p(\mathbf{s}) \mathrm{d}\mathbf{s} \right) \prod_{i=1}^n \lambda(\mathbf{s}_i) p(\mathbf{s}_i)) \\ \eta(s) & = \color{#FF6B6B}{\boxed{\beta_0}} + \color{#FF6B6B}{\boxed{ \omega(s)}} + \color{#FF6B6B}{\boxed{ \log p(s)}} \\ \end{aligned} \]
The code
# define model component
cmp = ~ Intercept(1) +
space(main = geometry, model = spde_model) +
sigma(1,
prec.linear = 1,
marginal = bru_mapper_marginal(qexp, pexp, dexp, rate = 1 / 8)
)
# define model predictor
eta = geometry + distance ~ space +
log(hn(distance, sigma)) +
Intercept + log(2)
# build the observation model
lik = bru_obs("cp",
formula = eta,
data = mexdolphin$points,
ips = ips)
# fit the model
fit = bru(cmp, lik)The LGCP Model
\[ \begin{aligned} p(\mathbf{y} | \lambda) & \propto \exp \left( -\int_\Omega \lambda(\mathbf{s}) p(\mathbf{s}) \mathrm{d}\mathbf{s} \right) \prod_{i=1}^n \lambda(\mathbf{s}_i) p(\mathbf{s}_i)) \\ \color{#FF6B6B}{\boxed{\eta(s)}} & = \color{#FF6B6B}{\boxed{\beta_0 + \omega(s) + \log p(s)}}\\ \end{aligned} \]
The code
# define model component
cmp = ~ Intercept(1) +
space(main = geometry, model = spde_model) +
sigma(1,
prec.linear = 1,
marginal = bru_mapper_marginal(qexp, pexp, dexp, rate = 1 / 8)
)
# define model predictor
eta = geometry + distance ~ space +
log(hn(distance, sigma)) +
Intercept + log(2)
# build the observation model
lik = bru_obs("cp",
formula = eta,
data = mexdolphin$points,
ips = ips)
# fit the model
fit = bru(cmp, lik)The LGCP Model
\[ \begin{aligned} \color{#FF6B6B}{\boxed{p(\mathbf{y} | \lambda)}} & \propto \exp \left( -\int_\Omega \lambda(\mathbf{s}) p(\mathbf{s}) \mathrm{d}\mathbf{s} \right) \prod_{i=1}^n \lambda(\mathbf{s}_i) p(\mathbf{s}_i)) \\ \eta(s) & = \beta_0 + \omega(s) + \log p(s) \\ \end{aligned} \]
The code
# define model component
cmp = ~ Intercept(1) +
space(main = geometry, model = spde_model) +
sigma(1,
prec.linear = 1,
marginal = bru_mapper_marginal(qexp, pexp, dexp, rate = 1 / 8)
)
# define model predictor
eta = geometry + distance ~ space +
log(hn(distance, sigma)) +
Intercept + log(2)
# build the observation model
lik = bru_obs("cp",
formula = eta,
data = mexdolphin$points,
ips = ips)
# fit the model
fit = bru(cmp, lik)The LGCP Model
\[ \begin{aligned} \color{#FF6B6B}{\boxed{p(\mathbf{y} | \lambda)}} & \propto \exp \left( -\int_\Omega \lambda(\mathbf{s}) p(\mathbf{s}) \mathrm{d}\mathbf{s} \right) \prod_{i=1}^n \lambda(\mathbf{s}_i) p(\mathbf{s}_i)) \\ \eta(s) & = \beta_0 + \omega(s) + \log p(s) \\ \end{aligned} \]
The code
# define model component
cmp = ~ Intercept(1) +
space(main = geometry, model = spde_model) +
sigma(1,
prec.linear = 1,
marginal = bru_mapper_marginal(qexp, pexp, dexp, rate = 1 / 8)
)
# define model predictor
eta = geometry + distance ~ space +
log(hn(distance, sigma)) +
Intercept + log(2)
# build the observation model
lik = bru_obs("cp",
formula = eta,
data = mexdolphin$points,
ips = ips)
# fit the model
fit = bru(cmp, lik)We can use the fit$summary.fixed and summary.hyperpar to obtain posterior summaries of the model parameters.
| mean | 0.025quant | 0.975quant | |
|---|---|---|---|
| Intercept | −8.41 | −9.48 | −7.63 |
| sigma | −0.05 | −0.46 | 0.36 |
| Range for space | 130.50 | 42.39 | 313.09 |
| Stdev for space | 1.19 | 0.73 | 1.79 |
The spde.posterior allows us to plot the posterior density of the Matérn field parameters
We can use the fit$summary.fixed and summary.hyperpar to obtain posterior summaries of the model parameters.
| mean | 0.025quant | 0.975quant | |
|---|---|---|---|
| Intercept | −8.41 | −9.48 | −7.63 |
| sigma | −0.05 | −0.46 | 0.36 |
| Range for space | 130.50 | 42.39 | 313.09 |
| Stdev for space | 1.19 | 0.73 | 1.79 |
The spde.posterior allows us to plot the posterior density of the Matérn field parameters
To map the spatial intensity we first need to define a grid of points where we want to predict.
fm_pixel() which creates a regular grid of points covering the meshpredict function which takes as input
fit)pxl)We can look at the posterior for the mean expected number of dolphins. Remember, the number of dolphins over the whole domain \(\Omega\) is \[ N(\Omega)\sim\text{Poisson}(E_{\Omega}), \text{ with } E_{\Omega} = \int_{\Omega}\lambda(s)ds \] so the expected mean number of dolphins is \[ E_{\Omega} = \int_{\Omega}\lambda(s)ds = \int_{\Omega}\exp\{\beta_0+\omega(s)\}ds \]
Estimate:
If want to predict the expected counts We can also get Monte Carlo samples for the expected number of dolphins as follows:
[1] 601 10